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Abstract 

We introduce and study finite d-volumes - the high dimensional generalization of finite metric spaces. 
Having developed a suitable combinatorial machinery, we define i!i -volumes and show that they contain 
Euclidean volumes and hypertree volumes. We show that they can approximate any d-volume with 0{n'^) 
multiplicative distortion. On the other hand, contrary to Bourgain's theorem for d = 1, there exists a 
2-volume that on n vertices that cannot be approximated by any £i-volume with distortion smaller than 

We further address the problem of £i -dimension reduction in the context of £i volumes, and show that 
this phenomenon does occur, although not to the same striking degree as it does for Euclidean metrics and 
volumes. In particular, we show that any £i metric on n points can be (1 + e) -approximated by a sum of 
0{n/e'^) cut metrics, improving over the best previously known bound of 0{n log n) due to Schechtman. 

In order to deal with dimension reduction, we extend the techniques and ideas introduced by Karger and 
Benczur, and Spielman et al. in the context of graph Sparsification, and develop general methods with a wide 
range of applications. 

ACM classes: G.2.O.; G.2.1; F.2.2 



1 Introduction 



This paper has two intertwined storylines. The first is a systematic attempt to develop a basic theory of finite 
volume spaces - a natural generalization of finite metric spaces. The second is an effort to extend the techniques 
and the ideas introduced in [7], [6], and to make them applicable to a wide class of sparsification problems. The 
synthesis of the two is reached when the resulting new sparsification methods are successfully applied in the 
context of finite volume spaces, for the li-dimension reduction problem. 

The blossoming of the theory of metric spaces in the last two decades affected both practical and theoretical 
algorithms design, and also the local theory of normed spaces. It developed its own key notions, posed intrigu- 
ing new problems, and solved many of these problems using novel methods. There is a rich interplay between 
the theory of finite metric spaces and graph theory. Often the former provides a unique prospective on many 
basic and important graph theoretic notions such as cuts, flows, expansion, minors and spanners. Motivated by 
all this, we introduce the abstract finite volume spaces, and attempt to use the notions, ideas and methods of 
finite metrics spaces in this more general setting. In doing this, we hope to contribute not only to the theory 
of finite volume/metric spaces, but also to the combinatorial theory of simpUcial complexes. We also get some 
new geometrical and algorithmical applications. 

The combinatorial theory of simplicial complexes draws much research activity in the recent years, as 
testified, to name but a few, by the studies of random 2-dimensional complexes, [16], [23], [5], [26], and the 
studies of embeddability of d-complexes in M", [21]. While developing the theory of finite volume spaces, 
we naturally arrive at complex-theoretic notions such as hypercuts, face expansion, and sparse spaimers. We 
establish some of their structural properties, and present some new constructions. 

The transfer to higher dimension is not without difficulties even on the level of basic definitions. E.g., the 
hypertrees (generalizing trees) have numerous distinct definitions, e.g. [25, 1, 11, 17]. Hypercuts (generalizing 
cuts) remain without explicit definition. (A number of possible definition are discussed in this paper. See also 
the supports of coboundaries of [16], and the two-graphs of Seidel [28].) In a sense, the theory of finite volume 
spaces helps to make a coherent choice among possible conflicting definitions. To clarify the presentation, 
we make an effort to consistently use the language of combinatorics and linear algebra instead of referring to 
algebraic topology. We also try to keep the presentation self-consistent, including in Section 2 some basic facts 
equipped with short proofs. 

Having provided the necessary combinatorial background, we embark on systematic study of finite volumes. 
In particular, using hypercuts, we define -volumes, and show that they can be used to approximate any finite 
volume, and that they contain the Euclidean volumes and the hypertree volumes. We show that contrary to 
Bourgain's theorem for d = 1, there exists a 2-dimensional volume on n vertices that cannot be approximated 
by any £i -volume with distortion smaller than U{n^/^). The best corresponding upper bound we can currently 
show is O(n^). 

The most technically elaborated part of our study of finite volumes is the the problem of £i -dimension 
reduction. 

The following is known. For the Euclidean d- volumes on n points, the result of [19] (that extends the 

famous Johnson-Lindenstrauss Lemma) shows that about 0(e~^logn) dimension will suffice for a (1 + e)- 
faithful representation. For £i -metrics, the elegant lower bound of Brinkman and Charikar [9] (see also Lee 
and Naor [15]) shows that in general, in order to get multiplicative distortion 0(1 + e) for a small e, one might 
need many as nP'^ dimensions. The best corresponding upper bound is due to Schechtman [27], showing that 
Cgn log n dimensions suffice to get a (1 + e) distortion. 

We show that li d- volumes can be (1 it e)-faithfully represented using 0{n'^ logn/e^) hypercut d- volumes, 
the high-dimensional analog of cut-metrics. This improves the trivial 0(n'^+^) upper bound. Moreover, for a 
natural subclass of ii d-volumes, we show a stronger bound of 0{n'^/e^) of special hypercut ci-volumes. Since 
for d = 1 all £i metrics belong to this special subclass, we obtain an 0(n/e^) upper bound on the approximate 
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cut dimension of any ii metric on n points. This improves on [27] in two ways: the number of dimensions is 
smaller, and each dimension is a cut-metric, a very special case of a line metric. 

To deal with the dimension reduction problem, we develop general sparsification methods extending the 
ideas and techniques of [7] and [6], originally aimed for graph sparsification. We believe that the resulting 
methods are of independent theoretical and algorithmical interest. Section 4.2 contains a short discussion of 
these methods, as well as an other application to a certain natural problem about geometric discrepancy. 

2 Basics of Combinatorics of Simplicial Complexes 

2.1 Cycles, Hypertrees and Coboundaries 

Let V be an underlying set of size n and let K^^ = {a C V\ \a\ = d + 1} be the set of all d-dimensional 
simplices on V. The boundary operator d maps a d-simplex a to a formal sum over Z2 of the {d — 1)- 
subsimpUces of a of co-dimension 1. For a set A C dA is defined as dA = X^^-g^ da. By virtue 

of Z2, this formal sum can be identified with a subset of K^~^. It is convenient to think about d in terms of 

the (^) X (^"^) incidence matrix over Z2 whose rows are indexed by {d — l)-simplices, the columns are 
indexed by d-simplices, and M^ir, a) = 1 if r C a, and otherwise. Then, for a set A of d-simplices it holds 
that MrfU = IdA- 

A d-cycle Z C is a subset of d-simplices that vanishes under the boundary operator, i.e., dZ = 0, or 

Malz = 0. 

Let a (spanning) d-hypertree be a maximal acyclic subset of d-simplices in K^^ . It is easy to verify that 
hke the usual spanning trees, d-hypertrees form a matroid, and therefore are all of the same size. Since the 
set of all d-simplices containing a fixed vertex w of F is a d-hypertree, the size of any d-hypertree must be is 
("d^)- ^^^^ ^ — homologically connected, or (without a risk of confusion with other definitions of 
connectivity) just connected if K contains a d-hypertree. (The connectivity of K is equivalent to the vanishing 
of the homology and the cohomology groups Hd-i(K), H'^^^{K) = over Z2, where K is treated as a 
simpUcial complex containing all low dimensional simplices on V.) 

Let G = Grf-i C K^~^^ be a subset of (d — 1) -dimensional simplices on V. A d-coboundary B induced 
by G is the sets of all d-simplices cr G such that the number of (d— 1) -dimensional faces of a that belong 
to G is odd. I.e., IqM^ = 1^. From this definition it is clear the d-coboundaries, like d-cycles, form a Unear 
space over Z2. A basic relation between the cycles and the coboundaries is: 

Claim 2.1 For any d-cycle Z and a d-coboundary B, \Z Ci B\ is even. 

Proof One needs to show that • 1^ = over Z2. Let G be the (d — l)-complex that induces B. Then, 

1b -Iz = laMdlz = 1g-0 = 0, 
where is the all-zero vector. ■ 

In fact, the about claim can be taken as an alternative definition of the coboundaries; moreover, it suffices 
to consider only cycles Z of the type dAd+i, i.e., the boundaries of (d -|- l)-simpUces on V. 
The hypertrees and the coboundaries are related in a complementary maimer: 

Claim 2.2 K C is connected iffKCiB^ %for any nonempty d-coboundary B. 

Proof We first show that for any hypertree T and any coboundary T fi B is not empty. Indeed, let G be the 
subset of Kn~^^ that induces S. If Tn i3 is empty, Iq is orthogonal to all the columns of corresponding 
to cr G T. But these columns span the entire column space of M^, and thus B must be trivial, contrary to our 
assumption. Thus, if K is connected, it intersects all the coboundaries. 
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Assume now that K is not connected, i.e., the columns of Mj, corresponding to d-simplices in K do not 
span the column space. Then, there must exist a vector Iq orthogonal to all these columns, but not to the entire 
column space. The induced B is thus nontrivial, and disjoint with K. ■ 

While any Gd-i uniquely defines a d-coboundary B, the opposite does not hold, and different G's may 
induce the same B. In fact, G and G' induce the same B^ iff G' = G ® -Bd-i where is a (d — 1)- 
coboundary.^ The ambiguity in choosing Gd-\ for a given B can be removed in the following manner. For 
X C and v a vertex of X, define the link of X with respect to v to be the following (d — l)-dimensional 
subcomplex of X: 

linkv(X) = {r e K^,'*-^) | v ^ r and {r U v} e X}. 

Claim 2.3 A d-coboundary B is induced by linkv(B). Consequently, there is a 1-1 correspondence between 
the [d — 1) -dimensional G^-i 's onV — {v}, and the d-coboundaries B C Kn^. 

Proof Let B' be the d-coboundary induced by linkv(B). Consider first a d-simplex a that contains v. Since 

linkv(B) lacks all the (d — l)-faces of a containing v, and contains the remaining (d — l)-face t = a — {v} 
iff a is in to B, the definition of coboundary B' implies that a G B' iff a G B. Consider next a d-simplex a = 
{vi,V2, . . . , Vd+i) that does not contain v. Consider the d-boundary of the (d+l)-simplex {vi,V2, ■ ■ ■ , Vd+i,v). 
It is a cycle, and all its d-faces with exception of a contain v. Since B' and B agree on all these faces, the parity 
argument from Claim 2.1 impUes that they agree on a as well. Thus, B' = B. ■ 



2.2 Hypercuts 

The generaUzation of cuts in graphs to higher dimensions is not straightforward. Topologists, in view of 
Claim 2.2, usually consider the coboundaries to be the proper generalization of cuts in graphs. We refine this 
topological definition, arriving at a notion that makes a lot of sense also from the volume-theoretic perspective 
(see the Section 3 below), as well as from the viewpoint of Matroid Theory. 

For A C xlf' , define an equivalence relation on d-simplices, ai ~ a2 mod A, if they are homologous 
relatively to A. I.e., there exists a simple d-cycle containing ai, gi, while the rest of its d-simplices belong to 
A. In terms of the matrix M^, it means the following. Let Col(X) denote the set of columns of Md indexed 
by cr G X C Then, a\ ~ 1T2 mod A if Ijcn} — 1{<t2} ^ span{Col(A)}. Call a d-simplex null 

homologous relative to A if there exists a simple d-cycle containing cr, while the rest of its d-simphces belong 
to A. Equivalently, G span{Col(A)}. 

Definition 1 Call C ^ %, a subset of d-simplices, a (combinatorial) d-hypercut if (*) no a E C is null 
homologous relatively to G; and {**)for any ai, (T2 G C // holds that a\ ^ 02 mod C. 

In other words, C is a hypercut iff C is maximal unconnected. This happens to be precisely the definition of 
the co-circuit of Kn'' treated as a simplicial matroid. 

In terms of the matrix Md, the Definition 1 means the following. Let Col denote the set of columns of 
Md. Then, C is a hypercut iff span{Col(C)} n Col = Col(C), and the co-dimension of span{Col(C)} in 
span(Col) is 1. 

Theorem 1 d-Hypercuts are precisely the d-coboundaries that are minimal with respect to containment. More- 
over, any d-coboundary B is a disjoint union of d-hypercuts. 

'This follows since Md-iMd = 0, and hence any (d — 1) -coboundary is in the left kernel of Md. Moreover, comparing the 
dimensions of the left kernel of Md and the space of (d — l)-coboundaries, one concludes that the two are equal. Using the language 
of the algebraic topology, this can be restated as h'^''^^^ {K^^ ) = 0, which in turn follows from the connectedness of Kii'\ 
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Proof The matrix definition of C implies that there exists a vector y such that y • -y = for any v G Col(C), 
and y ■ V = 1 the rest of the columns. Thus, a d-hypercut is also a d-coboundary 

Observe that for a ti-coboundary B it always holds that span{Col(B)} n Col = Col(B). If there exists 
nontrivial d-coboundary B' C B, then the following strict containments hold, 

span{Col(B)} C span{Col(B')} C span{Col} , 

implying that span{Col(B)} has co-dimension > 1, and thus is not a hypercut. For the other direction, if B is 
minimal with respect to containment, then for any a G i3 it must hold span{Col(B U cr)} = span{Col}, and 
thus span{Col(B)} has co-dimension 1, and therefore is a hypercut. 

Finally, let B and B' C Bhe coboundaries. Since coboundaries are closed under addition, B\B' = B®B' 
is also a coboundary, and thus S is a disjoint union of two coboundaries. Continuing decomposing these 
coboundaries, one arrives at a disjoint union of minimal coboundaries, i.e., hypercuts. ■ 

The following theorem is analogous to the fact that cutting an edge of a spanning tree one obtains a cut. 

Theorem 2 Let T be a d-hypertree, and a £ T. Then there exists a unique d-hypercut CT,a such that 

T n Ct,c7 = cr. More explicitly, Ct,^ is the set of all the d-simpUces r such that the unique cycle Z created by 

adding r to T, contains a. 

Proof Consider the set S of all ci-simplices whose columns are spanned by Col(T — {a}). Observe that any 
hypercut disjoint with T—{a} must also be disjoint with S. Let C = S. Observe that C is not empty, as cr G C 
We claim that C is a hypercut. Indeed, (*) holds by definition of C, while (**) holds since any d-simplex r is 
null homologous with respect to T, and thus, if it is not in S, it must be homologous to cr relatively to T — {a}. 

■ 

As a corollary of Theorem 2 we obtain another definition of the hypercuts. 

Corollary 2.1 Let C be the set of d-hypercuts and let T be the set of d-hypertrees. Then, C is the blocker of 

T, C = ■ That is, every hypercut intersect every hypertree, and any set S C Kfjf' with this property that is 
minimal (with respect to containment) is a hypercut. 

Proof The statement directly follows from Claim 2.2 and Theorem 2. It can also be shown within the 
framework of Mattoid Theory. ■ 

The next two results address finer issues related to hypercuts, in particular for d = 2. First, we provide 
a characterization of 2-hypercuts (vs. general 2-coboundaries) in terms of their finks, i.e., in purely graph- 
theoretic terms. 

Let G = (y, E) be a graph. Call two adjacent edges (n, v)^ {u, w) G E{G) V-equivalent if {v, w) ^ E{G). 
I.e., the restriction of G to {u, v, w} is a "V" with u at the apex. Taking the transitive closure of this relation, 
we call G V-connected if any two edges of G are V-equivalent. 

Theorem 3 Let B be a 2-coboundary, and let G = linkv(B) be its link with respect to an arbitrary vertex v. 
Then, B is a 2-hypercut iff G is V-connected. 

Proof Let x be a vector with coordinates indexed by the edges of Kn. Consider the following system of 
equations in x. For each e containing the vertex v, Xe = 0; for each triangle a ^ B, X^gg^ Xe = 0. We 
claim that this system of equations has a unique nontrivial solution iff B is a hypercut. Indeed, by definition, 
X = 1e{g) is one nontrivial solution, as 1e{g) induces B. The existence of another nontrivial solution x' is 
equivalent to existence of a nontrivial 2-coboundary B' (induced by x') strictly contained in B, as on every 
triangle a e B,x' must sum to 0. Recall that different links define different coboundaries. 



4 



Assigning the forced value to all where e contains v, and to all X(^a,b) where the triangle {a, b, v} 
B, we arrive at the equivalent system of equations X(^a,b) + ^{b,c) = whenever a,b,c G V — {?;}, and 
(a, b) , (6, c) G E{G) \ (a, c) E{G). Thus, the edges in the same V-equivalence class must be assigned the 
same value, but there is not restrictions for edges in different V-equivalence classes. We conclude that there is 
a unique solution iff there is one V-equivalence class, i.e., G is V-connected. ■ 

Let us comment that a random graph G on n — 1 vertices is almost surely V-coimected. (This is an easy 
exercise and we leave it to the reader.) Thus, in view of the above theorem, there are 2®("^) different 2- 
hypercuts. 

Another comment is of a more geometrical nature. A closer look at the structure of 2-hypercuts C reveals 
that not only for every two different a, r G C there exists a cycle Z with Z fi C = {o", r}, but, moreover, Z 
can be taken as a triangulation of the 2-sphere. This can be shown using the V-coimectedness of the links of C, 
first for cr, r that share a common vertex, and then, using transitivity, for any cr, r. This observation will not be 
used in the rest of this paper. 

How large/small can a d-hypercut be? A partial answer is provided by the following claim. 

Claim 2.4 The size of the minimum (nonempty) d-hypercut in Kn^ is n—d. The size of the maximum 2-hypercut 

is Q - 0(n2). 

Proof We start with the first statement, and prove it by induction on n, d. Since the minimum coboundary is 
a hypercut, it suffices to prove it for coboundaries. The statement clearly holds for d = 1 and for = d + 1. 
Assume that the statement is true for all pairs (n', d') where n' < n, d' < d. Let C be a nonempty d- 
coboundary, and let v he a vertex. Consider linkv(C). Then, |C| = \C'\ + |linkv(C)|, where C is the 
restriction of C on F — {v}, clearly a d-coboundary of K^}_^. Recall that linkv(C) cannot be empty. If C ^ 0, 
then by inductive hypothesis |C1 > (n — 1 — d) + 1 = n — d. Otherwise, by the previous discussion, 
linkv(C) must be a (d — l)-coboundary of K^f_-^\ and thus by inductive hypothesis \C\ = |linkv(C)| > 
(n — 1) — (d — 1) = n — d. The bound is tight, as shown by a d-hypercut that consists all the d-simplices 
containing a fixed (d — l)-simplex r. 

Let us just mention here without further elaboration that an alternative proof of the first statement can be 
obtained using the tools from the theory of simplicial matroids (see, e.g., [10] for a survey of this theory.) 

(2) 

For the second statement, consider the 2-coboundary B of Kn whose link is a complete graph on n — 1 
points excluding a Hamiltonian cycle. It is easy to verify that the criterion of Theorem 3 holds, and thus S is a 
2-hypercut. A simple calculation shows that forn > 5, |S| = (3) — (n — l)(n — 4). ■ 

We conclude this section with a result about the distribution of the sizes of d-hypercuts in Kn^ , in particular 
when d = 2. It should be noted that a similar but weaker result was shown earlier in [16] employing a somewhat 
more involved argument. 

Theorem 4 The number of d-hypercuts of size an is at most n'^d''^ where can be (very roughly) upper- 
bounded by d{d + 1). For d = 2we show a better upper bound of {4n)^°''^^. 

Proof Since \C\ = an, the average size of |linkv(C)| is (d -I- l)a, and therefore there exists a vertex v such 
that |linkv(C)| < (d + l)a. Thus, \C\ is induced by G of size at most (d + l)o;. However, setting m = (^), 
the number of such G's is at most = 0{n'^^'^~^^^"). For d = 2 we know that G is V-connected, hence 

it has at most one non trivial component containing at most 3a edges and 3a + 1 vertices. Thus, the number of 
such G's is at most 

m 
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2.3 Geometrical Hypercuts 

Geometrical hypercuts are a very special subfamily of the more general combinatorial hypercuts. They can 
be regarded as a different generalization of graph cuts to higher dimensions. Their definition is quite intuitive, 
but it takes some effort to show that they are indeed hypercuts. As we shall see, they are particularly useful in 
dealing with Euchdean realizations of simplicial complexes. 

Definition 2 Let cf) -.V ^ S"^"^, the unit sphere of dimension d — \, such that the points in the image are in a 
general position. The geometric hypercut C is defined as the set of d-simplices whose image under (p contains 
the origin. 

Theorem 5 Every geometrical d-hypercut C is a combinatorial d-hypercut. 

Proof We start with showing that for any a\,a2 G C it holds that a\ ~ 02 mod C. Assume first that 
the two simplices are disjoint. We use the following cylindric construction. Consider two parallel copies of 

M'^ in ]R'^+^, each containing S'^"^ with the (/)-image of V. Choose ai from first copy, and (T2 from the second 
copy. Then, by the general position argument, the boundary of the conv((Ji U CJ2) C W^^^ is triangulated by 
d-simplexes. For every d-simplex in this triangulation, consider the corresponding abstract simplex in 
An easy projection argument imphes that all the simplices resulting from the lateral d-simplices in the above 
triangulation (i.e., all but a\ and (72) are in C. Since the union of all the d-simplices in the above triangulation 
forms a cycle (even over Z), the statement follows. If the two simplices ai and (T2 are not disjoint, we make 
the two copies of intersect, such that all the common vertices (and only them) he in the intersection, and 
proceed in same manner. 

We next argue that no cr G C is null homologous relatively to C. Assume to the contrary that there exists a 
d-cycle Z such that Z P\C contains a single simplex a containing the origin. Using the central projection, we 
conclude that the realization of da = d{Z — (t) is a retract of the realization of Z. This can be refuted using 
standard basic algebraic topology arguments, e.g., Sperner Lemma. Although classically the Sperner Lemma is 
used in a weaker setting, it can be easily modified to apply here. In addition to the classical argument, one needs 
only to notice that since Z is a cycle over Z2, the colored sub-simphces lying in the abstract {d— l)-subsimplices 
of Z (with the exception of da), appear even number of times in the Sperner sum, and thus contribute nothing. 

■ 

An important property of geometric d-hypercuts is that the size of an intersection of such a d-hypercut with 
a d-cycle Z that is a boundary of (d+ 1) -simplex is either or 2. For combinatorial d-hypercuts this number can 
be any even value between and (d + 2). While this property does not characterize geometrical d-hypercuts, at 
least for d = 2 it comes close (see [12]). Moreover, using this property and the discussion following Claim 2.1, 
one gets another, less geometrical, proof of Theorem 5. 

Only a tiny portion of combinatorial hypercuts are geometric. E.g., for d = 2, the number of d-hypercuts is 
2©{n ) ^ as observed above, while the number of geometrical d-hypercuts can be shown to be 2®("'°s"). This is 
the number of distinct (with respect to the induced geometrical cuts) possible configurations of n points on the 
cycle. 

We conclude this section by mentioning a special subfamily of the the geometric hypercuts, which also 
was suggested as a reasonable generalization of the graph cuts. Partition d-hypercuts, studied e.g., in [17, 25], 
correspond to partitions V = {Vi, . . . , V^+i} of V into (d + 1) disjoint nonempty parts. The hypercut C-p is 
defined as C-p = {a G Kn'' \ |(T n 141 = 1, i = 1, 2, . . . , d + 1 }. It is easily verify that C-p is a geometrical 
hypercut, and thus a hypercut. 

The following problem of Graham pertaining to the partition hypercuts reflects the history of the early 
attempts at the proper definition of hypertrees, hypercuts etc. Graham defines a d-forest C Kn'' as a 
collection of d-simplices, such that for every a & there exists a partition hypercut C such that F^Ci C = a. 
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The problem was to estimate the maximum possible size of a d-forest. It was solved by Lovasz [17, 25] by 
introducing new (at the time) algebraic methods. 

Observe that the theory we have discussed so far allows to solve Graham's problem in a rather obvious 
manner. Claim 2.1 implies that F^, is acyclic, hence, by the discussion in Section 2.1, \Fd\ < ("^^^)- The 
tightness of the bound is witnessed by the d-hypertree containing all the d-simpUces that contain a fixed vertex 
veV. 



3 Abstract Volumes 

3.1 Basic Notions 

Let Kn''^^ be the simplicial complex on the underlying set V of size n containing all the simplices of dimension 
< donV. We define the abstract d-dimensional volume function vol*^^' : K^-^' M+ as a real nonnegative 
function with the following properties: (*) the simpUces of dimension < d have value 0; (**) the values of 
d-simpUces satisfy the following generalization of the triangle inequality: 

For every d-cycle Z of and every a ^ Z, it holds that^^,^2-a^'^^^'^^ i'^') — vol^'^^(cr) . (1) 

It is easy to verify that for d > 1 the condition (**) cannot be replaced by a requirement on cycles of bounded 
size. 

The most natural example of the volume function is the Euclidean volume: given an embedding (j)ofV into 
an Euclidean space, the volume of a d-simplex a, is the EucUdean d-volume of conv(0(cr)). 

Another important example is the analog of the shortest-path metric. Let X C Krf^ be a connected (i.e., 
containing a d-hypertree) subcomplex with nonnegative weights on its d-simplices. The volume volx induced 
by X on K^^ is defined by volx = minn^cx Z^o-'eD ^o-s where is a a-cap, i.e., a U is a cycle. (In 
particular, a itself is cr-cap.) 

The last example are cut volumes, which play a central role in this paper. Let C be a d-hypercut in . 
The corresponding volume function vol[^^^ assigns 1 to every o" G C, and to every a ^ C. To see that a 
cut volume is indeed a volume, it suffices to notice that a 0/1 function on d-simplices may fail to be a volume 
function iff there exists a cycle Z were all but one a G Z have value 0. By Claim 2.1, such Z does not exist for 

Volume functions on V are closed under addition and multiplication by a constant, and thus form a cone 

in M^''^^ • The extremal volumes in this cone are, as always, of particular interest. The following theorem 
provides a full characterization of 0/1 extremal volumes. Perhaps more important, it also establishes their 
inapproximability but any other metric. 

The multiplicative distortion between two d-volume functions voli and V0I2 on V is defined similarly to 
the metric distortion, i.e., 

, / 1 , X VOll(cr) V0l2(cj) 

dist( voli , vol2j = max — • max — . 

VOl2(cr) CT voli((7) 

Theorem 6 A 0/1 volume function vol'^'^-' is extremal iff it is a cut volume. Moreover, the distortion between 
such vol^*^) and any other volume function volj^^^ is infinite unless volj^^^ = a • vo some positive constant 

a. 

Proof Let vol^'^^ be a cut d-volume function defined by a hypercut C. Assume that vol^'^) = vol[*^^ + vol2'^^ . 
Consider vol^'^-' . It must be outside of C. Since any two a^a' £ C satisfy a ^ a' mod C, there exists a 
cycle Z = Z^ f^i such that Z n C = {a, a'}. Since all the d-simplices in C have volume 0, the generalized 
triangle inequality implies that vol^^^'' (a) = vol^^^ {a'). Thus, vol['^^ = a ■ yoI^'^\ as claimed. 
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For the other direction, consider an extremal 0/1 d- volume function vol(^\ Define C C Kn^ as C = 
{(J I vol('^)(cj) = 1}. Clearly, no cr G C is null homologous relatively to C, since otherwise the generalized 
triangle inequality would imply vol'^'^)(cr) = 0. Consider the equivalence relation ~ on C, i.e., the homology 
mod C. It suffices to show that it contains a single equivalence class. Assume to the contrary that there is 
an equivalence class C strictly contained in C. Define vol^^^ and yo\^2^ as follows. Outside of C both are 0. 
¥or a e C\C',Yo\f\a) = YO^^\a) = \; ior a e C , Yo\f\a) = 0.4, and vol^''^ (a) = 0.6. 
The definition of C implies that both vol^'*^ and volg*^^ are volume functions, contradicting the assumption that 
vol'-'^^ is extremal. 

The second statement follows easily along the same line of reasoning. The support of any volume function 
approximating such vol^*^) must coincide with the support of vol^*^) , and moreover, arguing as above, it must be 
constant on it. ■ 

The above theorem provides an additional motivation to our definition of hypercuts, this time from volume 
theoretical perspective. 

Much of the modern theory of finite metric spaces is devoted to the study of special metric classes that 
constitute a sub-cone of the metric cone, notably li metrics and A^E'G-type metrics. Crucially for applications, 
any metric on n points can be approximated by a special metric with a bounded distortion c^. E.g., for 
the rough bound of 0(n) on distortion follows from the minimum spanning tree argument, and the much better 
0(log n) bound is implied by Bourgain's Theorem [8]. Theorem 6 implies that any (closed) sub-cone of volume 
functions with the approximation property must contain the cone spanned by the cut volumes. Moreover, as we 
shall soon see, this cone already has the required property. This justifies the following definition. 

Definition 3 Analogously to one dimensional case, we define ii d-volumes to be the nonnegative combinations 
of cut d-volumes. 

Clearly, ii d-volumes constitute a sub-cone of d-volumes. 
3.2 ^l Volumes 

The most basic properties of li metrics are that they contain the class of tree-metrics and the class of Euclidean 
metrics. The situation with ii d-volumes turns out to be fully analogous. 

Theorem 7 Let T be a (spanning) d-hypertree with nonnegative weights on the d-simplices. Then, the induced 
d-volume fiinction vol^^ is £i. 

Proof Recall the definition of Ct,ct from Theorem 2. We claim that volfp ^ = J2aeT ■^olp(d) ■ For t £ T this 



follows at once, while for t ^ T, Xlcrgs ^^^Ct equal to the sum of weights of all the a's in S belonging 

to the cycle created by adding r to T, as it should be. ■ 

This implies the following approximability result. 
Theorem 8 Any d-volume on V can be approximated by an ii d-volume with distortion at most ("^^)- 

Proof Let vol^^^ be a d-volume function on and let T be the minimum (spanning) hypertree with 
respect to vol^^). Then, for a e T, vol!^\a) = vol^'^\a). For a ^ S, much Uke the MST in graphs, 
a must be the heaviest d-simplex in the cycle \Z\ created by adding a to T. Since the size of Z is at most 
H- |r| < 1 + ("^;^^), the statement follows. ■ 

While the upper bound on distortion of Theorem 8 is probably too rough and the true exponent of n is 
probably smaller, we shall see in what follows that even for d = 2 the distortion can be as large as ri(n5). 



T.CT 
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Thus, in general it is polynomial, and not logarithmic as in the case for d = 1 (Bourgain's Theorem [8]). 
Another important difference between d = 1 and d = 2 is that the Euclidean 2-volumes, and in fact even 
their nonnegative combinations, are unable to approximate at all even the simplest 2-volume functions, e.g., set 
V = {0, 1, 2, 3, 4} and vol{{i, i + 2,i + 3}) = 1, where + is taken mod 5, and vol{a) = for any other a. 
It is easy to see that this function is a volume and in fact geometrical cut volume. However, any geometrical 
reaUzation that approximate it can not collide any two points, which implies in turn, that it must assign a strictly 
positive volume to a {i.i + 1, , i + 2} simplex. 

Next we address the containment of Euclidean volumes in £i -volumes. 

Theorem 9 Any Euclidean d-volume is an i\ d-volume. In fact, it is a nonnegative combination of geometrical 
hypercuts. 

Proof (Sketch) The proof proceeds in three steps. First, observe that the random projection of a finite di- 
mensional Euclidean space on preserves (in expectation) the d-volumes up to scaling. Thus, it suffices to 
consider Euclidean d-volumes realizable in W^. Next, observe that given an embedding of V points in R^, the 
corresponding Euclidean volume function vol^'^) satisfies vol^'^) = /jj^^ volp^-* , where volpf^(o-) = 1 if the 

realization of a contains p, and otherwise. Treating p in volp'^'' as the origin, one can realize the same function 
by projectively mapping the points to S''"^, which implies that volp*^^ is geometrical cut volume. Measure 
argument take care of the degeneracies. Finally, by Theorem 5, every geometrical hypercut is a (combinatorial) 
hypercut, and thus we get an volume with the same values as the original EucUdean volume. ■ 

The main negative result of this section is the following lower bound on distortion of approximating general 
2-volumes by li 2-volumes. On the way we define a d-dimensional analog of the graphical 'edge-expansion', 
which is of independent interest. 

Theorem 10 There exists a 2-volume function such that any ii volume distorts it by at least r2(n^/^). 

Let us first outline the proof. Using the methods originally developed for the one-dimensional case, we construct 
a connected 2-dimensional simplicial complex K with unit weights on its 2-simplices, such that on one hand 
is has a constant normalized expansion, and on the other hand voIk has large average value. The existence of 
such K impUes that distortion of embedding voIk into li is large. Formally, given a as above, consider the 
following Poincare-type form over the 2-volumes: 

i^.(vol)= ^'^f°''"' , (2) 
av(vol) 

where av(vol) = tst ' t<'(2) vo1((t). By a standard argument frequently used in the theory of metric spaces, 
the distortion of embedding voIk into ii is lower-bounded by 

. / 1 , ^ ^ ^ miiivoiGi!! Fk(voI) 

dist(volK ^h) > „ . , . • (3) 

Fk(voIk) 

Keeping in mind that K is unit-weighted, and that any vol G is a nonnegative combination of cut-volumes, 
we conclude that the above minimum necessarily occurs on cut-volume, and thus Eq. 2 becomes: 

dist(volK h) > av(volK) • min (4) 

C: 2-hypercut li^l/iaj 

Observe that for a graph G the analogous expression 

\E{G)f^C\/\C\ . \\E{A,A)\ 1 \ n 

mm — ■ — ■ — , , , — = mm ' * 



c=£;{A,A): cut l£'(G)|/(2) AcV,\A\<n/2\ \A\ average degree of G J n-\A\' 

is the normalized edge expansion of G up to a factor of 2. By analogy, we define 
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Definition 4 Let the normalized (face) expansion ofKC Kn be the value of 

l^nc|/|C| 

mm 7^;^ — - . 

C: 2-hypercut l-f^l/lg) 

I.e., the normalized expansion of K is the ratio between the minimum density of K with respect to a hypercut, 
and the density of K with respect to Kn^ . 

Let Kn\n^p) be the 2-dimensional analog of the Erdos-Renyi G{n,p), where a G Kn'' is selected with 
probability p = 25 log n/n randomly and independently from the others. Theorem 10 follows from the follow- 
ing two Lemmas. 

Lemma 3.1 For K G K^'^\n,p) as above, av(volK) > fi(n^/^) with probability 1 — o(l). 
Lemma 3.2 The face expansion of K ^ K^'^^ (n, p) is almost surely > 0.5. 

Observe that Lemma 3.2 implies that K is connected, since if all 2-hypercuts meet K, then by Corollary 2.1 K 
must contain a (spanning) 2-hypertree. Thus, it strengthens the main result of [16] at the price of getting worst 
constants. 

Before starting with the proof of Lemma 3.1, let us first establish the following combinatorial result. 

Lemma 3.3 Let Z be a 2-dimensional cycle Z , then, \V{Z)\ < |Z|/2 + 2. 

Proof Clearly, linkv(Z) is an Eulerian (1 -dimensional) graph. As long as there is a vertex v G V{Z) for 
which linkv(Z) is not a simple cycle, do the following. Let Ai, . . . , ^4^ be the decomposition of linkv(Z) into 
edge-disjoint cycles. We introduce a new copy of u, Vj, z = 1, . . . r for each Ai, and replace each original 
2-simplex {v, x, y} containing v with a new 2-simplex {vi^, x, y} where {x, y) G Ai. This yields a new simple 
cycle Z' . Carry on with the this process on Z' etc. Since each time we produce a new 2-cycle with the same 
number of faces, but less vertices whose link is not a simple cycle, the process must terminate with a 2-cycle 
Z* with all links being simple cycles. Such Z*, using the language of algebraic topology, is a (vertex-) disjoint 
union of triangulations of 2-dimensional surfaces without boundary. Without loss of generality, assume that 
there is a single surface. It is known [20] that its Euler characteristics satisfies 



Observe that every edge e in Z* appears in exactly two faces, and thus 2\E{Z*)\ = 3|Z*|. Plugging this 
into Equation (5) implies the Lemma for and hence for We note that while this proof uses 

Equation (5), which is non-trivial and outside of this context, there is also an elementary proof using reduction 



Next, we address Lemma 3.1. 
Proof (of Lemma 3.1) By Markov inequality K almost surely contains o(n^) 2-simplices, and thus av(volK) 
is determined by the 2-simplices a ^ K. For each such a, vo1k(o") is the size of the smallest K-cap of a, i.e., 
the minimum subset of simplices in K that together with a form a simple cycle. Let us denote this cap by 
CapK(<7). Thus, to show that av(volK) > ri(A) (w.h.p.), it suffices to argue that the number of a ^ K for 
which the corresponding CapK(cr) has size less than A, is o(n^) (w.h.p). Let Nx be this number. Let be the 
number of simple cycles of size exactly k in Kn'' . Then, 



\V{Z*)\ - \E{Z*)\ + \Z*\ < 2 



(5) 



to smaller n's. 



A 




(6) 
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Now, by Lemma 3.3, a cycle of size k has at most k/2 + 2 vertices. Fixing t = k/2 + 2 vertices, the number of 
size-A; cycles on these vertices is clearly bounded by t^^. Hence < (A;/2+2)^^ • ((jt/2+2)) — ' {k'^'^ V^)'' ■ 
Plugging this bound on Uk, and the value of p into Equation (6), we get. 



^r,.n 2V^.,25 ^^A■ , /251ogn\''"^ / • 25 log n\ 

fc=4 ^ ^ k=4 \ V / 

Choosing A = 5^3^, we conclude that -^[A^;^] = 0(n log^ n) = 0{n), and by the Markov inequality we are 
done. ■ 



Proof (of Lemma 3.2) For a hypercut C, let jk{C) = ''^^|^|(J^^' • We shall first estimate the probability that 

1k{C) < 0.5 for any fixed hypercut C, and then use the union bound to conclude that almost surely no such 
hypercut exists. 

Observe first that ji^l is almost surely tightly concentrated around its mean which is E[K] = p ■ (3). Thus 
instead of discussing ^^^^^tJj^' we may safely discuss = '^f^TTf • Next, observe that |K n C| is a 

sum of |C| i.i.d Bemuh variables, and its expectation is precisely p\C\. Thus, by Chemoff bound, 

Pr(7;f(C) < 0.5) = Pr(|irnC| <p- |C|/2) < e-^"l^l/^ 

Let nis be the number of 2-hypercuts of size s in Kn \ By Theorem 4, < (4ra)^"'"^*/'*. Thus, the union 
bound implies that the probability that a bad C exists is at most 

■s-^ /Q x-^ / 25 log n . 3 log(4n) N 

J2 rus ■ e-P-'l^ < 4n ^ el 8 " + " > = o(l) . 

s>n—2 s>n—2 



3.3 Geometrical £1 Volumes, Exact and Negative Type Function 

By geometrical £1 volumes we mean nonnegative sums of geometrical cut volumes. As implied by Theorem 9, 
Euclidean volumes belong to this class. The following examples show that geometrical £1 volumes capture 
other geometrically defined volume functions as well. 

Example 1. Let f be a nonnegative weighting of {d — \)-simplices of K^'^\ Define a d-volume function 
vol(^) on Xlf^ by 

volW(c7) = 

{d — l)-simplex t G a 

Then, vol^^^ is a geometrical £\ volume since it can be represented by vol^^^ = "YLc r is{d-i)-simplex ^(^) ' 
, where vol(^) is a (geometrical) cut volume assigning 1 to the d-simplices containing t, and to the rest. 
In particular, the Euclidean perimeter, surface area, etc., are geometric £1 d-volumes. 

Example 2. Let T-L be a family of n affine hyperplanes in general position in W^, indexed by [n]. Assign to 
every d-simplex of the Euclidean volume of the unique bounded cell of M.^ formed by the corresponding 
{d + 1) hyperplanes. The resulting d-volume function (which can be interpreted as a measure of disagreement 
between the {d + l)-tuples of hyperplanes) is geometrical £1. 
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The proof is quite similar to that of Theorem 9. It suffices to show that for each p G M*^, the set of d-simplices 
a corresponding to the (d + 1) tuples of hyperplanes containing p in their bounded cell, is a (geometric) hy- 
percut. Indeed, map each hyperplane h to ph G W^, the basis of the perpendicular from p to h. Clearly, p is 
contained in the bounded cell of some {d + 1) hyperplanes {h} iff p belongs to the geometrical simplex {ph}- 
The conclusion follows. 

While so far our basic notions (i.e., boundary operator, cycles, and coboundaries) were over Z2, in the con- 
text of the geometric ^1 volumes it will be helpful to (shortly) discuss the corresponding theory over M. The 
presentation is not going to be entirely self contained, and we refer the reader to the first chapters of [24] for 
the background. 

As before, we consider x (^"^) incidence matrix over the reals, whose rows are indexed by (ar- 
bitrarily oriented) {d — l)-simplices, and the columns are indexed by (arbitrarily oriented) d-simplices. This 
time, Md{T, a) = 1 if r C a and its orientation is consistent with the orientation induced by a on its boundary, 
Md{T, a) = — 1 if T C o" but the orientations are inconsistent, and Ma{T, a) = if r ^ cr. 

The boundary operator d : K^f^ K^~^^ is defined by M^la = ^da^ ^nd is linearly extended to act 
on formal sums of d-simplices with real coefficients. A c?-coboundary B G M^-d+i) (i.e., a real function on 
d-simplices) is a vector in the left image of M^. That is, B^ = x^M^ for some x G M^^). 

An equivalent definition of a real d-coboundary, based on the fact that H'^^^ {Krf' , M) = 0, is: B G M^^+i) 
is a real d-coboundary iff it sums up to on the boundary of any (d + l)-simplex. I.e., B'^M^+i = 0. 

Definition 5 A real nonnegative function F : Kn^ i-> is exact if it is an (entrywise) absolute value of a 

real d-coboundary of Krf\ 

A real nonnegative function T : Kn^ MJ^ is 0/ negative type if it is a sum of (entrywise) squares of real 
d-coboundaries of Kn\ 

The exact d-volumes can be viewed as a d-dimensional analog of line metrics. Observe that exactness does 
not depend on the orientations used in the definition of M^. Observe also that in the alternative theory where 
the generalized triangle inequality of Eq. (1) is required to hold only for orientable cycles (i.e., cycles over R), 
the exact function are d-volumes. However, in some important case they are d-volumes according our original 
definition as well: 

Theorem 11 Cut volumes corresponding to geometrical d-hypercuts are exact. So are the Euclidean d-volumes 
realizable in R^. Consequently, geometrical £1 d-volumes, as well as the sums of squares of Euclidean d- 

volumes, are of negative type. 

Proof (Sketch) Consider a realization of K^f^ in defining the geometrical hypercut C, or the d-Euclidean 
volume under the consideration, with all d-simplices oriented in the same manner. I.e., left to right for d = 1, 
clockwise for d = 2, etc. Observe that the origin is contained in either zero or two d-simplices belonging to the 
boundary of any (d -|- l)-simplex C,. In the latter case one of these simpUces is necessarily oriented in a maimer 
consistent with the orientation induced by C„ and the other is not. Hence, (vol|i^^)^Md+i = 0, and thus vol[^^ 
is exact. The Euclidean volume, which is the integral of geometrical cut volumes defined by all p G M*^ with 
respect to a fixed realization of Kn \ must also be exact by a linearity argument. 

The second statement directly follows from the first for geometrical l\ d-volumes, as cut volumes take 
values 0/1. For general Euclidean volumes, recall that the square of any Euclidean d- volume (no matter in what 
dimension it is realized) is the sum of squares of its projections on all subsets of d coordinates. I.e., it is a sum 
of squares of d-Euclidean volumes. ■ 

To conclude this section, observe that Theorem 11 provides an alternative proof of Theorem 5, and in fact a 
bit more: a geometrical hypercut intersects not only every Z2-hypertree, but also any M-hypertree. Indeed, any 
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d-coboundary of Kn , in particular an appropriately signed Vq', that takes value on a basis of the space of 
columns of (i.e., on a M-hypertree), must be identically on Kn \ contrary to the definition of 

3.4 Dimension Reduction for £i Metrics and Volumes 

Given an £i d- volume vol = J2ceC • vc, where C is a collection of ri-hypercuts, vc is the cut volume 
associated with C, and Xc are positive reals, |C| is the cut-dimension of this particular representation of vol. 
We define the cut-dimension of vol as the minimum possible cut-dimension of any representation of it. 

Let the cut cone be the convex cone formed by all d- volumes on Kn^ . The extremal rays of this cone 
are the cut-d-volumes. 

Claim 3.1 The cut cone has full dimension. 

Proof Assume that a function / : K^f^ ^ M sums up to on every hypercut (and therefore, by Theorem 1, 
on any d-coboundary of Kn''). It suffices to show that / is identically 0. Let a be any d-simplex in and 
let Ti, r2 be distinct (d— 1) -dimensional faces of a. Let Bi,B2 and B12 be the d-coboundaries in K^^ induced 
by ri, T2 and {ti,T2} respectively. Then, = f{Bi) + /(-B2) — /(-B12) = "^/{cr), and the claim follows. 

■ 

Since the cut cone is a subset of M^d+i), Caratheodory Theorem implies that the cut-dimension of any vol'^ 
is at most (^"^) - Moreover, since the cut cone has a full dimension, all but a 0-measure subset of £1 d-volumes 
have precisely this cut-dimension. 

The dimension reduction phenomenon is the dramatical drop in the cut dimension when one is allowed to 
replace an ^i-volume vol by an e-close £i-volume vol'. The proximity in our case is measured by the point-wise 
ratio between vol and vol', which should he within (1 it e). I.e., the multiplicative distortion between vol and 
vol' is < j^. 

We show that the dimension reduction phenomenon occurs for £i-volumes for any d. For d = 1 and, more 
generally, for geometrical d-volumes of any dimension, we refine the argument, and get a better bound. In 
order to do this, we rely on some general sparsification tools to be developed and discussed in detail in the next 
chapter Here we present only the statements of these results, and then proceed to apply them in our setting. 

The geometric formulation is as follows. Let C be a family of nonnegative vectors in M™, and let cone(C) 
be the convex cone spanned by it. The goal is, given a vector w G cone(C), to produce a small subset C C C 
and a vector w' G cone(C') that (pointwise) approximates u; up to a multiplicative factor of 1 it e. 

The same can be conveniently reformulated in the matrix notation. Let M be am x \C\ real nonnegative 
matrix. Then, given a nonnegative vector A G M'^I, the goal is to produce a new A' G IRI^I such that on one 
hand w' = MX' approximates w = MX up to a multiplicative factor of 1 ± e, and on the other hand A' has 
small support. The columns of M are the vectors of C, and A, A' are coefficients of nonnegative combinations 
of these vectors. 

An upper bound on the size of support of A' will be given in terms of certain parameters of the matrix M 
alone, not depending on A. 

Definition 6 The triangular rank of a matrix M, trk(M), is the size of the largest lower-triangular square 
minor of M with strictly positive diagonal. The rows and the columns of the minor may appear in order 

different from that of M. 

The square-root rank of a nonnegative matrix M, rank*(M), is the the minimum possible rank (over Mj of 
a matrix Q where Qij = =b ^jMij. In particular, if M is Boolean, then Q ranges over all possible signings 
±M ofM. 
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Theorem 12 Let M be an m x \C\ nonnegative matrix as before, and let X be a nonnegative weighting of C. 

Then, for any 1 > e > 0, there exists (and is efficiently constructive ) another nonnegative weighting X' ofC 
such that MX' approximates MX up to a multiplicative factor ofl±e, and |supp(A')| = 0( rank*(M) / e^). 
IfM is Boolean, a different construction yield the same with |supp(A')| = 0(trk(M) • login / e^). 

Since \C\ can be arbitrarily large or even infinite, "efficiently constructible" requires further explanation. The 
input to the procedure is not the entire M and A, but only the the nonzero values of A, and the columns of 
M corresponding to them. The complexity is measured in terms of this input. We further comment that 
supp(A') C supp(A). 

We are now ready to address the dimension reduction for d- volumes. We start with general d. 

Theorem 13 Let vol be an t\ d-volume on n points, and letO<e< 1 be a constant. Then there exists ( and is 
efficiently constructible) an li d-volume vol' that distorts vol by at most a multiplicative factor <?/ and the 
cut-dimension of vol' is at most 0{n'^ log n/e^), thus improving the trivial 0(n^+^). 

Proof Let M be a (^"^) x \C\ Boolean matrix whose rows are indexed by d-simplices, the columns are 
indexed by d-hypercuts, and M{a, C) = 1 if cr belongs to the cut C and otherwise. Observe that MA's 
correspond to £i d- volumes on Kn\ and |supp(A) | is an upper bound on the cut-dimension of the respective d- 
volume. Thus, Theorem 12 applies, yielding an upper bound of 0( trk(M) ■ d log n / e^) on the cut dimension. 
It remains to upper-bound trk(M). It turns out be at most ("^^) ■ 

Indeed, let Q be a square N x N lower triangular nonsingular minor of M. Let the rows be indexed 
by Wi}iLi, and the columns be indexed by {Qj^i in this order. It means, in particular, that cTj G Cj, but 
Oi ^ Cj for J > i. We claim that the set of d-simplices {ai \ i = 1,. . . , N} does not contain d-cycles. Indeed, 
assume by contradiction that it does contain a cycle Z, and r be the largest index such that ar G Z. Consider 
the corresponding d-cut . Since ar & Z Ci Cr, by Claim 2.1, Cr must contain another c/-simplex from Z, 
contrary to the fact ai ^ Cr for every i < r. 

Thus, {(Til i = 1, . . . , A^} is acyclic, and is bounded by the size of the maximum acyclic subcomplex, 
i.e., d-tree, which is ("^^). ■ 

The special case of cZ = 1 is precisely the much studied problem of dimension reduction for £i-metrics. 
While the elegant lower bounds of [9, 15] show that one may at best hope for polynomial (and not logarithmic) 
dimension reduction, the best known upper bound of [27] asserts that Cgn log n dimensions suffice for 1 + 
e distortion. Theorem 13 yields the same upper bound, however it strengthens [27] by claiming it for cut- 
dimension, which is larger than the usual geometric dimension of the host £i -space. Further improvement is 
provided by using a different method. 

Theorem 14 Let < e < 1 and let d be an ii-metric on n points. Then, there exists (and is explicitly 
constructible) an £i- metric d' such that dist(d, d') < while the cut-dimension of 5' is at most 0{n/e^). 

Proof Let M be the (2) x C Boolean matrix as in the proof of Theorem 13 with d = 1. We claim that 
rank*(M) is at most n. This, in view of Theorem 12, yields the desired bound. 

Let be an |C| x n matrix whose rows are indexed by cuts, and columns by vertices. For a cut C = 
E(A, A), let B{C, v) = lifv € A, and —1 otherwise. Let X be a n x (2) matrix with rows indexed by V and 
columns indexed by arbitrarily directed edges. Let X{v, e) = 0.5 if v is the source of e, X{v, e) = —0.5 if v is 
the sink of e, and X{v, e) = otherwise. Observe that {BXy = ±M, and rank(BX) < n. ■ 

Interestingly, M has a full rank, as follows from Claim 3.1, and thus M is an example of a Boolean matrix 
with rank* (M) roughly the square root of its rank. Note that by a standard tensor product argument, rank* (M) 
can never be smaller than that. 

One may wonder how tight is the bound of Theorem 14. As we shall see, in terms of the dependence in n 
it is best possible. 
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Theorem 15 Let dn+i be the shortest path metric of the unweighted path Pn+i, i-e., dn+i{i, j) = \i — j\. This 
is certainly an £i metric. However, any metric d' = "^ceC ■ where \C'\ < n/t distorts d by at least t. 

Proof Since multiplicative distortion is not sensitive to scaling, we may assume without loss of generality 
that d dominates d'. This implies that each Xc is at most 1, as C must separate some pair of adjacent vertices 
k — l,k, and 1 = d{k — l,k) > d'{k — 1, /c) > Ac. But then all the distances in d' , and in particular 
d'{l, n + 1), are at most \C'\ = n/t, and the statement follows. ■ 

Finally, our third dimension-reduction result is about geomettical £i d- volumes. Since for d = 1 all hyper- 
cuts are geometrical, it is a nontrivial generalization of Theorem 14. 

Theorem 16 Let vol be a geometric i\ d-volume on n points, and let < e < 1 be a constant. Then there 
exists (and is efficiently constructible) a geometric li d-volume vol' that distorts vol by at most a multiplicative 
factor of and the cut-dimension o/vol' is at most 0{n^ j e^), thus improving Theorem 13 in this important 
special case. 

Proof Consider the (^^^) x \C\ Boolean matrix as in the proof of Theorem 13, only this time C is the family 
of all geometrical hypercuts. Call this matrix P. Since by Theorem 11a geometrical d-hypercut volume is a 
real d-coboundary of K^f* , we conclude that for every C G C there exists xc G M^d) such that the C-column 
of P is equal to it x^M^,, where is as in the definition of the real d-coboundary. Forming a matrix X 
from vectors {xc}ceC^ we conclude that P = ztX^M^. Hence, rank*(P) < rank(Md) = (^^~^)- Thus, by 
Theorem 12 we obtain an upper bound of 0(n'^/e^) on the cut-dimension of the approximating geometrical l\ 
cZ- volumes. ■ 



3.5 Some Remarks and Applications 

3.5.1 High Dimensional Sparsifiers and Approximating Forms. 

One of the main results of [6] claims that every (nonnegatively) weighted graph G has a (1 it e)-sparsifier 
G' of size Oinje^). That is, for every x G M", the two forms Fg{x) = j}g£;(G) ""^ijl^^i ~ ^jf' ^^d 
Fgi{x) = j}g£;(G") w\j{xi — Xj)^ differ by at most (1 ± e) multiplicative factor, where E{G') C E{G) 
and \E{G')\ = 0{n/e^). The authors of [6] further argue that such sparsifiers of the complete graph Kn have 
many common properties with (almost optimal) regular expanders of degree and in fact should be 

treated as such, despite the weights and the irregular degrees. 

Using a convexity argument, one can re-define sparsifiers as above in terms of metrics spaces: G' is a spar- 
sifier of Gas above iff the two forms FG(d) = Y.{i,j}<=E(G)^ij(^(^^j)^^^^G'{d) = J2{i,j}eEiG')^'ij(^(^^j) 
are (1 it e)-close for every negative type distance d on V{G). This simple observation akeady has interesting 
consequences. E.g., it implies that in order to (1 + e) approximate the average distance of a metric of negative 
type, it suffices to query 0{n/e^) values (according to the suitable iv'), and thus can be done in sublinear time. 
This somewhat surprising corollary was established earlier for Euclidean metrics (a special case of negative 
type metrics) by using a different argument in [3], in turn improving upon an earUer result of P. Indyk. 

The general framework of Section 3.3 together with original argument of [6] allow to extend the above 
results to higher dimensions. 

Theorem 17 For every weighted simplicial complex K of dimension d there exists a sparsifier K' such that 
the two d-forms Fk{v^^^) = ^^^j^ WKicr)v'^'^\a) and F^'iv^'^^) = Y^^^j^, WK'{o-)v^'^\a) differ by at most 
(1 lb e) multiplicative factor on any function v^*^^ of negative type, and \K'\ = 0(n'^/ e^). Le., K' has a constant 
average degree (measured with respect to [d — l)-simplices). 
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Proof A proof based on Theorem 12 is quite natural here, but we prefer the original argument of [6] on 
which the latter theorem is based. Keeping in mind that the functions of nonnegative type are nonnegative 
combinations of (entrywise) squares of real d-coboundaries, it suffices to establish the statement for squares of 
real d-coboundaries. 

Recall that a real d-coboundary G M^^+i) is defined by a vector x G M^rf) by JBj = x'^M^, where 
Md is the real incidence matrix as in Section 3.3. Thus, Fk{B1) = x'^ ■ {M^WkM^) ■ x, where Wk is a 
diagonal (^"j^) x (^"j^) matrix indexed by d-simplices, where WKic, a) = wk{(^)- Applying Theorem 21 to 
the matrix M^WkMJ = {M^^/Wk) • (\/W^MJ) we conclude that there is another weighting w' such that 
|supp(w')| = 0(rank(Md)/e2), and • {MdWxMj) • x andx^ • {MdW'Mj) ■ x differ by at most (1 ± e) 
multiplicative factor. Keeping in mind that rank(M(j) = C^^^), and defining K' as the support of w' , we 
arrive at the desired conclusion. ■ 

As a bonus we get a sublinear algorithm for approximating the average value of functions of negative type, 
in particular the Euclidean c?-volumes, and the geometric d-volumes: 

Corollary 3.1 In order to (1 + e) approximate the average value of a function of negative type, it suffices to 
query 0{n'^ /e^) predefined (and efficiently computable) {d + l)-tuples forming a high-dimensional sparsifier 
of an (constant) average degree ^ 

3.5.2 Sparse Spanners. 

o(i) 

It is well known that the average degree in a graph H with n vertices and girth g is n vs/ . Since (see [4]) the 
shortest-path metric do of a weighted graph G can be {g — 1) -approximated by that of its subgraph H of girth 

g, there exists a g'-spanner of G with at most n^^'^^s) edges. The construction naturally carries on to volumes, 
which brings us to a question: What is the maximal number of d-simplices in a simplicial complex K on n 
vertices, such that the smallest d-cycle of K is of size > gl The probabilistic construction of Lemma 3.2 (with 
small local amendments) shows that for d = 2 there exists K of average degree O(logn), and the smallest 
cycle of size Cl{n^''^). (By degree of a 1-simplex e we mean the number of 2-simplices in K that contain e.) 
Thus, the situation for d = 2 significantly differs from the graph theoretic case. It would be interesting to get 
tighter bounds for this problem. See also [18] for a somewhat related discussion. 

3.5.3 On ci(K). 

Like in graphs, given a d-complex K one may ask what is the worst possible distortion of approximating 
voIk, a lightest-cap volume of K (over all choices of nonnegative weights of its simpUces), by an £i volume. 
This important numerical parameter is called (by analogy with graphs) ci{K). One of the most important open 
questions in the theory of finite metric spaces is whether any graph G lacking a fixed minor has a constant 
ci{G) (see e.g., [13] for a related discussion and partial results). It is natural to ask a similar question about 
d-complexes: what properties of K would imply a nontrivial upper bound on ci{K)l The techniques of [13] 
imply this: ci{K) < 2'^^^\ where K (as usual) is assumed to have a complete (d — 1) skeleton and xi^) is 
the Euler characteristic of K. The construction proceeds via repeatedly picking a minimal cycle, and removing 
a random d-simplex in it with probability proportional to its volume. The lightest-cap volume of the random 
(sub-)hypertree of K obtained in this manner dominates voIk, yet stretches it (in expectation) by only a constant 
factor. 
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4 Abstract Sparsification Techniques 



As already indicated above, the general problem to be discussed is in this part of the paper is as follows. Let 
C be a family of nonnegative vectors in W^, and let cone(C) be the convex cone spanned by it. The goal is, 
given a vector w G cone(C), to produce a small subset C C C and a vector w' G cone(C') that (pointwise) 
approximates w up to a multiplicative factor of 1 it e. 

Using the matrix notation, let M be a m x |C| real nonnegative matrix. Then, given a nonnegative vector 
A G RI''', the goal is to produce a new A' G such that on one hand w' = MX' approximates w = MX up to 
a multiphcative factor of 1 it e, and on the other hand A' has small support. The columns of M are the vectors 
of C, and A, A' are coefficients of nonnegative combinations of these vectors. For computational purposes, we 
assume that the input to the procedure is not M and A, but only the the nonzero values of A, and the columns of 
M corresponding to them. It will always hold that supp(A') C supp(A). 

We seek to single out the relevant parameters of the matrix M such that |supp(A')| as above can be upper- 
bounded in terms of these parameters alone, not depending on A. The problem appears to be of a fundamental 
nature, far transcending the particular context of the previous sections (there are some additional examples at 
the end of this section). We initiate the study of this problem here, and produce two families of such parameters 
yielding the desired upper bounds. The first result is restricted to Boolean matrices, the other is more general 
but weaker (if one ignores a log m factor, which in fact is not always ignorable). Both results are almost tight in 
the special case, sufficient, but apparently not necessary. Importantly, they are inherently Umited to < e < 1. 
The situation for large e's appears to be radically different, and calls for further study. 

In what follows, it will be convenient and combinatorially justified to interpret M as M|jr|x|c|, an 'inci- 
dence' matrix of a quantitative relation between the members of a family J" (indexing the rows) and the family 
C (indexing the columns). In this interpretation, A = {Xc}ceC is a weighting of C that induces a weighting 
w = {wf} f^j: on T by assigning w{f) = 'Y^^.^c ^{fi c)'^c- I-e., w{f) is the weighted sum of all the columns 
incident to /. For example, in Theorem 13, F stands for the family of d-simplices, and C stands for the family 
of d-hypercuts. The relation represented by the corresponding M is the membership: M((T, C) = if a G C, 
and M((T, C) = otherwise. 

4.1 The First Technique 

We restrict our attention to Boolean matrices M. The key parameter of M will be its triangular rank. Recall 
that the triangular rank of M, trk(M), is the size of the largest nonsingular lower-triangular square minor of 
M. The rows and the columns of the minor may appear in order different from that of M. 

Theorem 18 Let M be a 0/1 matrix as before, A a nonnegative weighting ofC, and w = MX. Then, for any 

< e < 0, there exists (and is efficiently constructible) another nonnegative weighting a ofC, such that the 
support of a is of size at most 0{ trk(M) • logm| / e^), and w' = Ma (entrywise) distorts w by at most (1 it e) 
multiplicative factor 

Proof The method of proof is inspired by the method of Karger and Benczur from [7]. 

The existence of a will be established using a probabilistic argument. We start with some preparatory 
observations and tools. Let Cole be the column of M indexed by c G C. Making Ac copies of each column 
Cole, c G C, we arrive at the new M' with same triangular rank, and w = Ml, i.e., A becomes an all-1 vector, 
and w is the sum of columns. We assume that w.l.o.g., this is the original input. (Of course, Ac may not be 
integer, but we take for the sake of the proof infinitesimal units, and use the scalability of the problem. The 
algorithmic issues will be addressed later.). In addition, w.l.o.g., we assume that M does not have all-0 columns. 

As we are about to sample the columns of M, notice that some columns are more essential for w than the 
others, and thus the sampUng is necessarily non-uniform. For example, if a certain column Cole is the only 
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column of M such that Colc(f) > for some f ^ F, and Wc > 0, then Cole must necessarily be chosen. More 
generally, if the row of some f ^ T has small support, the columns corresponding to this support should be 
sampled with relatively hight probability. This motivates the following definition, analogous to the strength of 
an edge in [7] : 

Definition 7 (The strength of a column) Let M, w = Ml, be as above. The function s : C N assigning to 
each column of M a strength value, is defined by the following iterative process: 

Let = M; w^, = w, and m = minjr^j:- w{f), where min"'" is the smallest strictly positive entry ofw. 
While M is not all 0, repeat: 

L While there is f ^ F such that < w^{f ) < m, do: 

Assign s{c) = mfor every c in the support of the f-row, Rowf of M^. For every such c set Cole to to 
get a new M*, and update it;* to the new sum of columns ofM^. 

2. Ifw^: is not identically 0, set m = min^^-p w^{f), and return to (1). 

Observe that while the order in which /'s are chosen in (1) is somewhat arbitrary, at each invocation of (2) the 
set of columns set to 0, and the new value of are uniquely defined, and do not depend on the order of choices 
made in (1). Thus the strength function is well defined. Observe also that identical columns necessarily get 
identical strengths. Finally, observe that the value of the strength never decreases along the run of the process 
above. 

Definition 8 Let C be the column indices as above, and let si < S2 < ■ ■ ■ < st be the sequence of correspond- 
ing strengths in the increasing order Define Ci = {c E C \ s(c) > Si}, and Wi = J2ceCi Cole, i = 1,2, . . . ,t. 
Observe that Ci is monotone decreasing with respect to containment, and that all the non-zero entries ofwi are 
at least Si. 

Call a single run of the while loop of (1) a phase. During a phase, one sets to precisely all the (still surviving) 
columns Cole such that c € supp(Rowf) in M*, causing w^{f) to become 0. All these columns get the same 
strength m. The following Lemma establishes some important properties of the strengths. 

Lemma 4.1 

7. Let Sk be the maximal strength of a column Cole where c G supp(f) C C in the original M. Then, 
|supp(Rowf)| > Sk- In particular, this implies that w{f ) > s^- Observe, however, that by maximality of 
Sk, during any single phase no more than c'sfrom supp(Rowf) are set to 0. 

2. X^eeC 5^ — ^^^'■^ ^ ^otal number of phases. This parameter is crucial for the forthcoming 
analysis. 

3. The total number of phases N is at most trk(M). 

Proof The first statement directly follows from the definition of the strengths. That is, let c G supp(Rowf ) 
for which s(c) = s^. Then when s(c) is set, ■«;*(/) = Sk < w{f). Hence, since each c' G supp(Rowf) 
contributes exactly 1 to w^,{f) the claim follows. 

For the second statement, consider a contribution of a phase of (1) to the left hand side of the inequality. 
Each column set to contributes 7-, where Si is the current m (constant during the phase), while the number of 
such columns is at the beginning of the phase, which is at most m. Thus, the contribution of a phase is 

at most Si ■ — < 1, which implies the claim. 

For the third statement, for each phase i, let fi G -F be the coordinate that initiated the phase. Mark a 
Ci G C such that Cole; was set to during the phase. Consider the corresponding N x N minor of M. Clearly, 
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^{fi^ Cj) = 1 for all i. Since during the z'th phase all the surviving columns Cole such that M{fi, c) = 1 are 
removed, it follows that for every k > i and Ck that survives after the i'th phase, M{fi, ct) = 0. Thus, the 
A^-minor of M on rows {fi, f2, ■ ■ ■ , Jn) and columns (ci, C2, Civ) is a nonsingular lower triangular matrix. 

■ 

We presently define the sampling procedure to be used in the proof of Theorem 18: 

Definition 9 Let p > 1 be a parameter to be defined later For each c & C, define pc = min{^^, 1}, 
and let Xc be a random variable (indicating whether the column c is chosen) defined by Pr(Xc = 1) = Pc 
and Pr(Xc = 0) = 1 — Pc- Choosing the columns randomly and independently according to the specified 
probabilities, we obtain a random subset of columns C = {c| X{c) = 1}. Finally, setting olc = ^/Pc> we define 
a random vector w' = J^ceC Q^cColc. 

The shall use here the following version of the Chemoff Bound (see Theorems A. 1.12, A. 1.1 3 in [2]). 

Theorem 19 [2 ] Let Xi, . . . Xn be independent Poisson trials such that Pr (Xj = 1) = pi. Let S = ^Xi and 
u = Y,Pi. Then for any < ^ < 1, Pr[S ^ (1 ± ^) • i^] < 2e-'^^''/^. 

We start with showing that almost surely the size of C is 0{pN). 

Lemma 4.2 With probability 1 — o(l) the size ofC is at most 2p ■ N < 2pn. 

Proof Since \C'\ = J2ceC -^c^ items (2) and (3) of Lemma 4.1 imply that 

E[\C']\ = Y,p, < ^pMC) < p.N. 

cec cec 

Since the Xc are independent, Theorem 19 applies, implying that Pr(|C'| > 2pN) < 2e^^/^''^. ■ 
Next, observe that the expectation of w' is w: 

Claim 4.1 E{w') = w. 

Proof E(k;') = ^J2ceC' • Cole] = HJ2ceC • • Cole] = Ecec(Pc • ac)Cole = J2ceC Cole = w . 

■ 

The next goal is to show that w' is tightly concentrated around its mean. Since the parameters pc and ac 
of the column c depend solely on its strength s{c), the sequence of strengths si < S2 < ■ ■ ■ < st defines 
the sequence of probabilities pi > P2 > ■ ■ ■ > Pu and the sequence of weights ai < a2 < . . . < a*. The 
following claim is easily verified; essentially it is an Abel's summation transform: 

Claim 4.2 

t 

w' = ac ■ Cole = Ai • X(c) • Cole , where Ai = ai — ai-i, and Ai = ai. 
c i=i ceji 

Let Zi = J2c£Ci -^c ■ Cole. The key point in the forthcoming lemma is that the random component of Zi{f ) 
is either empty, or has expectation > p, making the Chemoff bound of Theorem 19 applicable. Choosing p 
appropriately, and using the union bound over all i, f, one arrives at the desired conclusion. 

Lemma 4.3 Set p = ^ (ln(2|^|) + Int + k), where k > is any real number, and t is the number of distinct 
strengths si. Then, Pr[w' ^ (1 ± e)w] < e~^. 
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Proof For any fixed f e T, Zi{f) = Ylced ^cColc(f), a sum of independent Boolean variables. The 
columns with s(c) < p deterministically contribute 1 to this sum, as in this case pc = 1. If there are no 
other columns, we are done. Else, let > p be the maximal strength of the column in supp(Rowi(f)). By 
Lemma 4.1(1) there must be at least Sk columns of such strength in this collection, and therefore E[zi{f)) > 
SkPk > Sk - = p. Thus, by Theorem 19, 

Vr[zi{f) ^ (1 ± e) • E[z,(/)] ] < 2e^-^^'*^f^^ < 2e^-f . (7) 

Substituting the proposed value for p, we conclude that the above probabiUty is at most |^|~^ • N"^ ■ e~^. 
Taking the union bound over alH = 1, 2, . . . , t and / G we conclude that the probability that there exist 
i, f with with Pr[ Zi{f) ^ (1 it e) • E(/Xi(a;, y)) ] is at most e~^. Keeping in mind that w' = Yli=i ' -^j' 
statement follows. ■ 

Choosing k large enough constant, and keeping in mind that t < N < trk(M), Lemma 4.2 implies that 
C' is almost surely of size at most 2pN = 0(trk(M) log(| J"|)/e^. On the other hand, by Lemma 4.3, w' = 
J2c£C' "^c • Cole almost surely distorts w by at most a (1 + e)/(l — e) multipUcative factor. This estabUshes 
Theorem 18. ■ 

Algorithmic considerations: Recall that for simpUcity of presentation, instead of working with a weighted 
set C, we have worked with unit-weighted multiset obtained by producing Ac duphcates of each C. Due to scal- 
ability, we could assume that Ac is a huge integer, and the rounding issue does not arise. While it indeed simpli- 
fies the presentation, this approach results in a very inefficient randomized procedure for selecting the sparser 
family C of Theorem 18. However, observe that the duplicates of C are sampled randomly and independently 
with the same probability pc, the resulting total weight of C is distributed according to a binomial distribution, 
and can be efficiently produced. When weights are not integers, we may simulate the process by massive scal- 
ing, which leads to sampling according to the Poison distribution with parameter Ac- A detailed discussion of 
this issue can be found in [7], (see Section 2.4 and Theorem A.l there). The resulting sparsification procedure 
can be implemented in time 0(n^ • |C|). 

To conclude the discussion of this section, let us remark that for large T's, sometimes a better upper bound 
can be obtained, as in the original result of [7], by strengthening the Eq. (7) in the proof of Lemma 4.3. Instead 
of using a uniform lower bound on the expected value of Zi{f), one may sometimes rely on finer distributional 
properties of this random variable, and get significantly stronger results. 



4.1.1 The Second Technique 

Here M does not have to be Boolean, just nonnegative. The key parameter of M will be, as in Theorem 12, the 
minimum possible rank of (Hadamard) square root of M. 

Definition 10 For D > 1, define rankJ)(M) as the minimum rank over all matrices A such that for all it 
holds that Mij < Afj < D ■ Mij. Equivalently, M < Y oY < DM, where o stands for the Hadamard 
(i.e., entrywise) product of matrices. In particular, /e? rank*(A) = rank*(A). 

Theorem 20 Let M be a matrix as before, A a nonnegative weighting of C, and w = MX. Then, for any 
< e < 0, there exists (and is efficiently constructible) another nonnegative weighting X' ofC, such that the 
support of X' is of size at most 0{Taiik^(M) / e^), and w' = Ma (entrywise) satisfies {1 — e) ■ MX < Ma < 
D ■ (l + e) ■ MX. 

Observe that rankJ)(M) > trk(M) for any D. 

The powerful technical tool we are going to employ, (imphcitly) appears in its strongest form in recent 
important paper [6]: 



20 



Theorem 21 [6] Let Bmxn i>e a real valued matrix, and let Qnxn be Q = B^B. Then, for every e > there 
exists (and can be efficiently constructed) a nonnegative diagonal matrix Amxm with at most 0{e^^n) (or even 
0(e~^rank(Q) ) positive entries, and with following property. Let Q = B^AB. Then, for every x G M" it 
holds: 

(1 - e) • x'^Qx < x'^Qx < (1 + e) • x'^Qx . 

Actually, [6] is solely interested in the Laplacian matrices of positively weighted graphs, and the above theorem 
is stated there only for such Q's. However, a close examination of the proof reveals that with a minor change 
(related to rank of Q) it works also for general positive semidefinite symmetric Q's. 

Proof Clearly, it suffices to the prove the theorem for D = 1. The extension for larger D's is obtained in a 

trivial manner. 

Let k = rank*(M). Our aim is to attach to each f e a vector x/ G R^, and to each c G C a vector 
be G MJ' such that Xf ■ bc = ±M{f, c) 2 . Let B{X) be a |C| x A; matrix whose rows are ^pTcbc- Then, for each 
/, it holds that 

x)B{XfB{X)xf = ^Ae(M(/,c)^)' = ^AcM(/,c) = w{f) . 
ceC cgC 

Applying Theorem 21 to the matrix B{X)^ B{X), we get the desired k/e^ sparsification. 

It remains to construct the required x/'s and bcS. Equivalently, we need to construct the matrices B\c\xk 
and Xf;y^\jr^ such that (BX)^ o (BX)^ = M. However, it is given that there exists (and can be efficiently 
found) a matrix Yj^i^^i^ji of rank k such that Y oY = M. Using a standard Unear algebra argument, we 
(efficiently) decompose Y'^ as Y'^ = BX, where B and X are matrices as required above. ■ 

4.2 Additional Remarks and Examples 

4.2.1 co-Circuits in Matroids. 

The argument used for bounding the triangular rank of M employed in the proof of Theorem 13 actually 
appUes in the much more general case when the rows of M are indexed by the elements of a matroid M, 

and the columns of M are indexed by co-circuits (or circuits) of A4's. One needs only to observe that the 
intersection of a circuit and a co-circuit in A4 cannot be a single element. The conclusion is that in the general 
case, trk(M) is at most the size of a maximum independent set in ^A. 

4.2.2 Splitting Set Systems. 

The Boolean matrix M used in the proof of Theorem 18 for d = 1 (i.e., the inclusion matrix of edges vs. edge 
cuts) could be described somewhat differently using vertices instead of edges. Then, the rows correspond to 

subsets e of y of size 2, the columns correspond to nontrivial subsets A of V, and M(e, A) = 1 iff |e H A| = 1. 
This situation is a special case of what we call a splitting set system, and the claim that trk(M) = |V| — 1 turns 
out to be a special case of a more general theorem. 

Let T,C ^ 2^ be any two families of subsets of V. For every f E T and c G C, say that c spUts / if 
cD f 7^ 0andcn / 7^ 0. Define the incidence matrix M by M(/, c) = 1 if c spUts /, and M(/, c) = 
otherwise. 

Theorem 22 Let M be the incidence matrix as above. Then, trk(M) < |V| — 1. 

Proof Let Q he a square N x N lower triangular nonsingular minor of M. Let the rows be indexed by 
{fi}iLv columns be indexed by {ci}^i in this order. It means, in particular, that Cj always splits fi. 
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but Cj with j > i, does not split fi. Consider the partition of V, the underlying set induced by the family 
{cj+i, . . . , Cat}. Since no Cj in it splits fi, fi must be contained in a single atom of the partition. Since q splits 
fi, the partition induced by {cj, Cj+i, . . . , cat} must strictly refine the previous partition. Therefore, the number 
of atoms in the partition induced by { ci , C2 , . . . , cat } is at least iV + 1 . But then A'^ + 1 < \V\, and the statement 
follows. ■ 



4.2.3 Random Boolean Matrices. 

Let M be a random m x n Boolean matrix, m > n Then, by a standard probabihstic method argument, 
trk(M) = 0(min{log(m), n}) almost surely. The trivial details are omitted. 

4.2.4 An Application to Geometric Discrepancy. 

We conclude the paper with an example of an appUcation of the sparsification methods of this section to a 
natural purely geometric question with a discrepancy flavor. 

The general problem is as follows. Assume we have a family T of bodies in W^. The goal is to produce 
a small sampling set P C W^, i.e., a set of points with associated positive weights, such that for every body 
B ^ F ii holds that Ylip^pnB = (1 e)vol(^) (B), where vol^^^ is the EucUdean volume. Unlike the usual 
discrepancy setting, bodies of small volume are as important as bodies of large volume. 

Theorem 23 Let S be a set of n points in the plane, and let T he family of all closed non self-intersecting 
polygons with vertices in S. Then, there exists a sampling set P for T as above of size 0{n^ / e^). Moreover, 
such P can efficiently constructed in time polynomial in n. 

Proof First, observer that it suffices to estabUsh the theorem for the triangles with vertices in S, since all 
other polygons in T can be triangulated, and thus are disjoint union of such triangles (ignoring the boundaries). 
Treating these triangles as a 2-dimensional realization of Kn \ and associating with each point p € a 
geometrical 2-hypercut (as in the proof of Theorem 9) we conclude that the induced Euclidean volume on Kn 
is a geometrical £i volume. Thus, by Theorem 16, this 2-volume can be (l±e) approximated by a geometrical £i 
2-volume of cut-dimension 0(n^ / e^). Moreover, since supp(A') C supp(A), the approximating £i 2-volume 
is induced by a weighted sampling set of points P of this size. 

In order to produce P in polynomial time, first compute the O(n^) cells created the lines spanned by S. The 
initial sampling set Pq will have a point p in the interior of each such cell, with the associated weight Wp being 
the area of the cell. Clearly, samples Pq without errors, but it is too big. Next, apply the procedure underlying 
Theorem 16 to this input to obtain the required P C Pq- In particular, this involves finding the representation 
of each geometrical 2-hypercut corresponding to p € Pq as a real 2-coboundary. I.e., we need to suitably assign 
each directed 1-simplex over S, e = (si, S2), a real value Xg. The easiest way to do it is by setting Xg to be the 
angle between si and S2 with respect to p, in clockwise direction, normalized by 

All this can obviously be done in polynomial time. ■ 

Theorem 23 generalizes to higher dimension without difficulty for d-simplices, and more generally, for 
triangulable polytopes over S. 
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